129 research outputs found

    Testing whether linear equations are causal: A free probability theory approach

    Full text link
    We propose a method that infers whether linear relations between two high-dimensional variables X and Y are due to a causal influence from X to Y or from Y to X. The earlier proposed so-called Trace Method is extended to the regime where the dimension of the observed variables exceeds the sample size. Based on previous work, we postulate conditions that characterize a causal relation between X and Y. Moreover, we describe a statistical test and argue that both causal directions are typically rejected if there is a common cause. A full theoretical analysis is presented for the deterministic case but our approach seems to be valid for the noisy case, too, for which we additionally present an approach based on a sparsity constraint. The discussed method yields promising results for both simulated and real world data

    Distinguishing cause from effect using observational data: methods and benchmarks

    Get PDF
    The discovery of causal relationships from purely observational data is a fundamental problem in science. The most elementary form of such a causal discovery problem is to decide whether X causes Y or, alternatively, Y causes X, given joint observations of two variables X, Y. An example is to decide whether altitude causes temperature, or vice versa, given only joint measurements of both variables. Even under the simplifying assumptions of no confounding, no feedback loops, and no selection bias, such bivariate causal discovery problems are challenging. Nevertheless, several approaches for addressing those problems have been proposed in recent years. We review two families of such methods: Additive Noise Methods (ANM) and Information Geometric Causal Inference (IGCI). We present the benchmark CauseEffectPairs that consists of data for 100 different cause-effect pairs selected from 37 datasets from various domains (e.g., meteorology, biology, medicine, engineering, economy, etc.) and motivate our decisions regarding the "ground truth" causal directions of all pairs. We evaluate the performance of several bivariate causal discovery methods on these real-world benchmark data and in addition on artificially simulated data. Our empirical results on real-world data indicate that certain methods are indeed able to distinguish cause from effect using only purely observational data, although more benchmark data would be needed to obtain statistically significant conclusions. One of the best performing methods overall is the additive-noise method originally proposed by Hoyer et al. (2009), which obtains an accuracy of 63+-10 % and an AUC of 0.74+-0.05 on the real-world benchmark. As the main theoretical contribution of this work we prove the consistency of that method.Comment: 101 pages, second revision submitted to Journal of Machine Learning Researc

    On the links between sub-seasonal clustering of extreme precipitation and high discharge in Switzerland and Europe

    Get PDF
    River discharge is impacted by the sub-seasonal (weekly to monthly) temporal structure of precipitation. One example is the successive occurrence of extreme precipitation events over sub-seasonal timescales, referred to as temporal clustering. Its potential effects on discharge have received little attention. Here, we address this topic by analysing discharge observations following extreme precipitation events either clustered in time or occurring in isolation. We rely on two sets of precipitation and discharge data, one centred on Switzerland and the other over Europe. We identify “clustered” extreme precipitation events based on the previous occurrence of another extreme precipitation within a given time window. We find that clustered events are generally followed by a more prolonged discharge response with a larger amplitude. The probability of exceeding the 95th discharge percentile in 5 d following an extreme precipitation event is in particular up to twice as high for situations where another extreme precipitation event occurred in the preceding week compared to isolated extreme precipitation events. The influence of temporal clustering on discharge decreases as the clustering window increases; beyond 6–8 weeks the difference in discharge response with non-clustered events is negligible. Catchment area, streamflow regime and precipitation magnitude also modulate the response. The impact of clustering is generally smaller in snow-dominated and large catchments. Additionally, particularly persistent periods of high discharge tend to occur in conjunction with temporal clusters of precipitation extremes

    Insights into the drivers and spatio-temporal trends of extreme Mediterranean wildfires with statistical deep-learning

    Full text link
    Extreme wildfires are a significant cause of human death and biodiversity destruction within countries that encompass the Mediterranean Basin. Recent worrying trends in wildfire activity (i.e., occurrence and spread) suggest that wildfires are likely to be highly impacted by climate change. In order to facilitate appropriate risk mitigation, we must identify the main drivers of extreme wildfires and assess their spatio-temporal trends, with a view to understanding the impacts of global warming on fire activity. We analyse the monthly burnt area due to wildfires over a region encompassing most of Europe and the Mediterranean Basin from 2001 to 2020, and identify high fire activity during this period in Algeria, Italy and Portugal. We build an extreme quantile regression model with a high-dimensional predictor set describing meteorological conditions, land cover usage, and orography. To model the complex relationships between the predictor variables and wildfires, we use a hybrid statistical deep-learning framework that can disentangle the effects of vapour-pressure deficit (VPD), air temperature, and drought on wildfire activity. Our results highlight that whilst VPD, air temperature, and drought significantly affect wildfire occurrence, only VPD affects wildfire spread. To gain insights into the effect of climate trends on wildfires in the near future, we focus on August 2001 and perturb temperature according to its observed trends (median over Europe: +0.04K per year). We find that, on average over Europe, these trends lead to a relative increase of 17.1\% and 1.6\% in the expected frequency and severity, respectively, of wildfires in August 2001, with spatially non-uniform changes in both aspects

    Hotspots and drivers of compound marine heatwaves and low net primary production extremes

    Get PDF
    Extreme events can severely impact marine organisms and ecosystems. Of particular concern are multivariate compound events, namely when conditions are simultaneously extreme for multiple ocean ecosystem stressors. In 2013–2015 for example, an extensive marine heatwave (MHW), known as the Blob, co-occurred locally with extremely low net primary productivity (NPPX) and negatively impacted marine life in the northeast Pacific. Yet, little is known about the characteristics and drivers of such multivariate compound MHW–NPPX events. Using five different satellite-derived net primary productivity (NPP) estimates and large-ensemble-simulation output of two widely used and comprehensive Earth system models, the Geophysical Fluid Dynamics Laboratory (GFDL) ESM2M-LE and Community Earth System Model version 2 (CESM2-LE), we assess the present-day distribution of compound MHW–NPPX events and investigate their potential drivers on the global scale. The satellite-based estimates and both models reveal hotspots of frequent compound events in the center of the equatorial Pacific and in the subtropical Indian Ocean, where their occurrence is at least 3 times higher (more than 10 d yr−1) than if MHWs (temperature above the seasonally varying 90th-percentile threshold) and NPPX events (NPP below the seasonally varying 10th-percentile threshold) were to occur independently. However, the models show disparities in the northern high latitudes, where compound events are rare in the satellite-based estimates and GFDL ESM2M-LE (less than 3 d yr−1) but relatively frequent in CESM2-LE. In the Southern Ocean south of 60∘ S, low agreement between the observation-based estimates makes it difficult to determine which of the two models better simulates MHW–NPPX events. The frequency patterns can be explained by the drivers of compound events, which vary among the two models and phytoplankton types. In the low latitudes, MHWs are associated with enhanced nutrient limitation on phytoplankton growth, which results in frequent compound MHW–NPPX events in both models. In the high latitudes, NPPX events in GFDL ESM2M-LE are driven by enhanced light limitation, which rarely co-occurs with MHWs, resulting in rare compound events. In contrast, in CESM2-LE, NPPX events in the high latitudes are driven by reduced nutrient supply that often co-occurs with MHWs, moderates phytoplankton growth, and causes biomass to decrease. Compound MHW–NPPX events are associated with a relative shift towards larger phytoplankton in most regions, except in the eastern equatorial Pacific in both models, as well as in the northern high latitudes and between 35 and 50∘ S in CESM2-LE, where the models suggest a shift towards smaller phytoplankton, with potential repercussions on marine ecosystems. Overall, our analysis reveals that the likelihood of compound MHW–NPPX events is contingent on model representation of the factors limiting phytoplankton production. This identifies an important need for improved process understanding in Earth system models used for predicting and projecting compound MHW–NPPX events and their impacts.</p
    • 

    corecore